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Abstract 

Background: Protein-lipid interactions play essential roles in the conformational stability and biological functions 
of membrane proteins. However, few of the previous computational studies have taken into account the atomic 
details of protein-lipid interactions explicitly. 

Results: To gain an insight into the molecular mechanisms of the recognition of lipid molecules by membrane 
proteins, we investigated amino acid propensities in membrane proteins for interacting with the head and tail 
groups of lipid molecules. We observed a common pattern of lipid tail-amino acid interactions in two different 
data sources, crystal structures and molecular dynamics simulations. These interactions are largely explained by 
general lipophilicity, whereas the preferences for lipid head groups vary among individual proteins. We also found 
that membrane and water-soluble proteins utilize essentially an identical set of amino acids for interacting with 
lipid head and tail groups. 

Conclusions: We showed that the lipophilicity of amino acid residues determines the amino acid preferences for 
lipid tail groups in both membrane and water-soluble proteins, suggesting that tightly-bound lipid molecules and 
lipids in the annular shell interact with membrane proteins in a similar manner. In contrast, interactions between 
lipid head groups and amino acids showed a more variable pattern, apparently constrained by each protein's 
specific molecular function. 



Background 

About 20-30% of all proteins encoded in a typical gen- 
ome are estimated to be localized in membranes [1,2], 
where protein-lipid interactions play crucial roles in the 
conformational stability and biological functions of 
membrane proteins. Many experimental studies have 
suggested that physico-chemical properties of the mem- 
brane lipid bilayer influence the stability and function of 
membrane proteins. The thermal [3,4] and chemical [5] 
stability of the potassium channel KcsA has been shown 
to vary according to the lipid composition of the mem- 
brane bilayer. It has also been shown that the lipid com- 
position affects protein functions including: ion 
transport in KcsA [6,7] and the Ca + -ATPase of sarco- 
plasmic reticulum [8,9], phosphorylation by the diacyl- 
glycerol kinase [10] and chemical compound transport 
by the mechanosensitive channel of large conductance 



* Correspondence: kenji@nibio.go.jp 

'Department of Fundamental Research, National Institute of Biomedical 

Innovation, 7-6-8 Saito Asagi, Ibaraki, Osaka, Japan 

Full list of author information is available at the end of the article 



MscL [11]. To complement these experimental studies, 
statistical analyses have been carried out to reveal amino 
acid preferences and conservation patterns within the 
lipid bilayer environment [12-16] using available 
sequence and structural data. The patterns emerging 
from these statistical analyses should reflect implicitly 
the effects of lipid molecules on the structural formation 
and stability of membrane proteins. However, few of the 
previous computational studies have taken into account 
the atomic details of protein-lipid interactions explicitly. 
A notable exception is all-atom molecular dynamics 
(MD) simulations; it has become possible to apply the 
technique to membrane proteins in conditions mimick- 
ing biological membranes (reviewed recently by Khalili- 
Araghi and co-authors [17]). All-atom MD simulations 
enable us to inspect protein-lipid interactions in atomic 
details [18,19] and can reveal the role of lipids in protein 
function [20], albeit for a small selection of specific lipid 
and protein molecules. 

In this paper, we attempt to understand the nature of 
protein-lipid interactions using a computational 
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approach. Given the limited number of crystal structures 
containing lipid molecules, we decided to combine all 
known biological phospholipids together and classify the 
atomic interactions into those involving the "head" and 
"tail" parts of the lipids. The head and tail groups can 
be found in most phospholipids constituting a biological 
membrane and define one of the most essential chemi- 
cal features of these molecules. Thus, we ask more spe- 
cifically: "How are the head and tail portions of lipid 
molecules recognized by amino acid residues in mem- 
brane proteins?" 

To answer this question, we utilized two available data 
sources, crystal structures and MD trajectories. Using 
the crystal structure data, we can include and examine 
various kinds of proteins and lipids, although the num- 
ber of lipid molecules observed in each solved structure 
is limited. Using the MD data, we can obtain detailed 
information about all the lipid molecules surrounding a 
protein, although such an analysis is possible only for a 
small set of protein and lipid types. The combination of 
these two data sources allows us to assess the biases 
resulting from a limited variety of data in each data 
source. The results revealed a common pattern of lipid 
tail-amino acid interactions observed in both the crystal 
structures and MD trajectories. We show that the recog- 
nition of lipid tails can be explained largely by general 
lipophilicity and that this effect dominates in the two 
different situations represented by the crystal structure 
and MD datasets. In contrast, lipid head groups showed 
a more complicated and diverse pattern and we discuss 
how our observations can be related to known experi- 
mental data and previously proposed concepts concern- 
ing protein-lipid interactions. 

Methods 

Lipid definition and dataset 

Lipids in this paper were defined as phosphoglycerides 
that consisted of one or two fatty acids linked through 
glycerol phosphate to zero or one polar group, and their 
mimetic compounds. First, an initial list of three-letter 
HET IDs of lipids in the Protein Data Bank (PDB) [21] 
was obtained by keyword searches against the Chemical 
Component Dictionary (CCD) through Ligand Expo 
[22] and PDBeChem [23] using all the MeSH terms 
below 'Glycerophosphates' in the MeSH hierarchy. Next, 
mimetic compounds were found by the "Similar Com- 
pound Search" function at PubChem [24] and RCSB 
PDB [25]. Finally, all the collected compounds were 
manually checked to determine whether they met the 
definition of lipids above. A total of 98 HET IDs were 
collected (Table 1) and used to search for proteins in 
contact with lipids in the PDB repository (see the next 
section). 



Table 1 List of HET IDs for the phospholipids considered 
in this paper 
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Crystal structure data of protein-lipid complexes 

Using the HET IDs listed in Table 1, a local repository 
of the PDB (updated on February 9, 2011) was scanned 
for the crystal structures of proteins that contained 
these lipid molecules. Retaining only those structures 
solved at 4.0 A resolution or better (ignoring structures 
solved by NMR and other methods, for which resolution 
was unavailable), a total of 290 protein-lipid complexes 
were obtained initially, consisting of 1,657 chains. Pro- 
tein chains that were smaller than 30 residues, that con- 
tained one or more non-standard amino acid residues 
(except for selenomethionine, which was treated as 
MET) and that had no lipid contacts (see below for the 
definition of contacts) were removed from this set, leav- 
ing 1,497 protein chains. These sequences were clus- 
tered using the BLASTClust program (available from 
the BLAST [26] distribution) at a 25% sequence identity 
cutoff, resulting in 148 clusters. Clusters in which all the 
members had less than five residues in contact with 
lipids were discarded. The remaining clusters were clas- 
sified into transmembrane (TM) and non-transmem- 
brane (non-TM) in the following manner. A cluster was 
initially annotated as either TM, if any of its members 
was found in the PDBTM [27] or OPM databases [28] 
(both downloaded on February 6, 2011), or non-TM 
otherwise. To confirm the presence (or absence) of TM 
helices, PDB2TMD [29] was run, followed by manual 
inspection to ensure that all the proteins were correctly 
annotated as TM or non-TM. From each cluster, the 
protein chain with the highest number of lipid-contact- 
ing residues was selected as the representative, produ- 
cing 45 TM and 27 non-TM protein chains (Table 2). 

Although the resolution cutoff for data collection has 
been set to 4.0 A, the worst resolution of any included 
structure was 3.7 A. Also, only two protein chains in 
the TM data set had worse than 3.5 A resolution, and 
only four had worse than 3.0 A resolution. All the non- 
TM structures had 3.0 A or better resolution. Thus, the 
final list contained most proteins solved at a decent 
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Table 2 List of transmembrane (TM) and non-transmembrane (non-TM) protein chains in complex with lipids 

Proteincode a Protein name Lipid code b Total Chainlength 

contacts' 1 

(a) Transmembrane (TM) protein chains 

1gzm_A Rhodopsin PEF 5 329 

1kqg_B Formate dehydrogenase-N (Fdn-N) CDL 7 289 

1kqg_C Formate dehydrogenase-N (Fdn-N) CDL 5 216 

1m56_D Cytochrome c oxidase PEH 16 42 

lnen_C Succinate dehydrogenase (SQR) CDN, EPH 19 129 

1nen_D Succinate dehydrogenase (SQR) CDN 9 113 

1pp9_Q Cytochrome bd complex CDL, PEE 15 241 

1pp9_T Cytochrome bd complex CDL, PEE 12 76 

1vf5_D Cytochrome b6f complex OPC 5 168 

1vf5_N Cytochrome b6f complex OPC 6 202 

1x0i_1 Bacteriorhodopsin (BR) L3P 14 215 

1xio_A Sensory rhodopsin (SR) PEE 32 217 

lzoy_D Succinate:ubiquinone oxidoreductase (SQR) EPH 10 102 

2b6o_A Aquaporin-0 (AQPO) MC3 27 235 

2bl2_l Vacuolar-type (V-type) sodium ion-pumping adenosine triphosphatase (Na LHG 14 156 

+ -ATPase) 

2brd„A Bacteriorhodopsin (BR) DPG 47 222 

2c3e_A A DP/ ATP translocase 1 CDL 40 293 

2e75_B Cytochrome b6f complex OPC 13 160 

2e76_F Cytochrome b6f complex OPC 5 32 

2eau_A Ca 2+ -ATPase PTY 19 994 

2eim_W Cytochrome c oxidase CDL 5 58 

2ein_0 Cytochrome c oxidase CDL, PEK, PSC 18 226 

2h89_C Succinate:ubiquinone oxidoreductase (SQR) PEE 5 139 

2hg3_H Reaction center CDL, PC9 14 240 

2hh1_L Reaction center CDL, PC7, PC9 10 281 

2hhk_M Reaction center CDL, PGK, PGT 23 302 

2irv_B Rhomboid protease (GIpG) PGV 11 179 

2r9r_B Voltage-dependent K + (Kv) channel PGW 35 386 

2wll_B Potassium Channel (Kir) PLC 6 266 

2z73_B Rhodopsin PCI 5 347 

3a7k_A Halorhodopsin (HR) L1 P, L3P 33 259 

3abl_N Cytochrome c oxidase CDL, PEK, PGV, PSC 29 513 

3abl_P Cytochrome c oxidase CDL, PEK, PGV 85 259 

3abm_G Cytochrome c oxidase CDL, PEK, PGV 25 83 

3ag4_Z Cytochrome c oxidase PGV 5 43 

3bz2_A Photosystem II (PSII) LHG 6 335 

3bz2_C Photosystem II (PSII) LHG 5 447 

3bz2_D Photosystem II (PSII) LHG 6 340 

3cx5_C Cytochrome bd complex 6PH, 7PH, 8PE, 9PE, CN3, 42 385 

CN5 

3ddl_B Xanthorhodopsin (XR) PCW, PX4 6 250 

3eam_C Bacterial ligand-gated ion channel homologue (GLIC) PCI 23 311 

3egw_C Nitrate Reductase A (NarGHI) AGA 10 224 

3emn_X Voltage-dependent anion channel (VDAC) 1 MC3 7 283 

3h1j_R Cytochrome bd complex PEE, PLC 11 196 

3h1j_W Cytochrome bd complex PEE, PLC 9 59 

(b) Non-transmembrane (non-TM) protein chains 

1bp1_A Bactericidal/permeability-increasing protein (BPI) PCI 47 456 

lbwo_A Nonspecific lipid transfer protein (ns-LTPI) LPC 20 90 
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Table 2 List of transmembrane (TM) and non-transmembrane (non-TM) protein chains in complex with lipids 

(Continued) 
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a Protein ID and chain ID. 

b HET IDs of all contacting lipids in the complex. 
c Total number of residues with lipid contacts. 

resolution. All the statistical analyses in this paper were 
based on these protein chains unless otherwise specified. 
Although no conscious selection was made, the protein 
chains in the TM dataset were mostly helical, with the 
only exception of a beta barrel anion channel protein 
(PDB:3emn). 

MD simulation data 

MD simulations were carried out for three TM proteins, 
the protein-conducting channel Thermus thermophilus 
Sec YE (ttSecYE) [30], Ca 2+ -ATPase of skeltal muscle 
sarcoplasmic reticulum [31] and Methanococcus jan- 
naschii SecYEp (mjSecYEP) [32], with the membrane 
lipids POPC (palmitoyl-oleyl-phosophatidylcholine) (for 
ttSecYE and mj SecYEp ) and DOPC (dioleyl-phosphati- 
dylcholine) (for Ca 2+ -ATPase), respectively. MD trajec- 
tory data were obtained from the all-atom model 
simulations of these proteins in the fully hydrated lipid 
bilayer using the isothermal-isobaric ensemble (NPT) 
and constant area isothermal-isobaric ensemble (NPAT) 
[20,30,33]. The total simulation length was 100 ns for 
each simulation run. A total of 1,000 snapshots taken 
every 100 ps were used for the analysis. 



Amino acid-lipid contacts and propensity scores 

Various types of amino acid-lipid contacts exist in pro- 
tein-lipid complexes. They were broadly grouped into 
(1) hydrogen-bonded, (2) van der Waals and (3) salt 
bridges. These contacts were defined by using the 
HBPLUS program [34] with the standard atomic radii 
from the PDB het dictionary [35]. The default defini- 
tions of van der Waals interactions and hydrogen bonds 
were used to identify the amino acid-lipid contacts. 
According to the algorithm used in HBPLUS, hydrogen 
atoms were first added to the protein structure and then 
a hydrogen bond was identified if (i) the donor-acceptor 
distance was less than 3.9 A, (ii) the hydrogen-acceptor 
distance was 2.5 A and (iii) all three angles D-H-A, D- 
A-AA and H-A-AA were greater than 90°. (D, A, H and 
AA stands for donor, acceptor, hydrogen, and acceptor 
antecedents, respectively.) For aromatic interactions, the 
angles D-A-AX and H-A-AX (for amino-aromatic inter- 
actions) were also required to be less than 20°. (Further 
details and a list of acceptor and donor atoms can be 
found at [36].) The amino acid residue-lipid contacts 
were further classified into lipid tail and head group 
contacts. Specifically, the tail group of a lipid was 
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defined as the set of all the atoms from the aliphatic tail 
to the carbon atom next to the carbonyl group of the 
fatty acid (or the corresponding carbon atom in a 
mimetic lipid). The head group of a lipid was defined as 
all the other atoms. The tail groups are predominantly 
hydrophobic, while the head groups are hydrophilic. 

All contact preferences were measured in terms of a 
propensity score. First, a propensity score for each of 
the 20 amino acid residues was computed for each pro- 
tein. The propensity P t of residue type i (e.g., LYS; i = 1 
... 20) in a protein was defined as the relative number of 
residues of type i in contact with lipids, normalized by 
the overall relative number of residues in contact with 
lipids: 



N, 



N' 



(1) 



N 



where Ni is the number of lipid binding amino acid 
residues of type i, N( is the total number of amino acids 
of type i, A/* is the total number of lipid binding resi- 
dues and N is the total number of amino acid residues. 
All the counts were made within the given protein 
sequence. The propensity values range between 0 and 
oo. An amino acid propensity value of 1 indicates a neu- 
tral preference to binding lipids, while propensity values 
of <1 and >1 show a low and high preference, respec- 
tively. If a residue type was not represented in a protein 
chain, its propensity was undefined and excluded from 
further statistics. If a particular amino acid type was pre- 
sent in the chain but was not binding to lipids, its pro- 
pensity was 0. Finally, the propensity scores thus 
computed for each protein chain were averaged over a 
set of proteins to draw comparison between one set (e. 
g., TM) and another (e.g., non-TM). The standard error 
of the mean was estimated as s j \fn, where s is the sam- 
ple standard deviation and n is the sample size (i.e., the 
number of protein chains in the set considered, for 
which the propensity was defined). 

We derived all the contact statistics from the entire pro- 
tein chains including the residues in extra-membranous 
loops, because lipid-contacting residues were found both 
in the TM helices and loops and also, to make a natural 
comparison between the TM and non-TM proteins. 
Focusing only on the TM regions would not change the 
overall statistics, as most TM proteins considered had 
only short loops (with the exception of the MD trajectory 
data for Ca + -ATPase, for which the large extra-membra- 
nous domain was excluded from the analysis). 

Chi-square test and statistical significance 

To determine whether a particular amino acid is statisti- 
cally significantly over- or under-represented in contact 



with lipid head or tail atoms, we pooled all the contact 
counts in the TM or non-TM dataset (considering only 
those proteins with at least six residues forming a given 
type of contacts). The expected number £; of lipid bind- 
ing residues of type i in a given dataset was computed 
as 



Ei 



Ni ■ N b 
N 



(2) 



where N it N b and N were as above but obtained for 
the entire dataset. It was then compared with the 
observed number Oj of lipid binding residues of type i 
by using a Chi-square statistic: 



At 



Ei? 
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The calculated y f 2 values were converted to p-values 
using the standard Chi-square table with a single degree 
of freedom. 

Propensity in MD trajectories 

To calculate propensity scores from the MD data, a con- 
tact was defined using a non-integer value equal to the 
fraction of the snapshots, in which the amino acid resi- 
due under consideration was in contact with any lipid 
molecule. More precisely, the total number Afyy 6 of lipid 
binding counts for the Ath amino acid residue in each 
MD trajectory was defined as 



E4W 



NT - 



£ 
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where I(k) b {t) is 1 if the /<th amino acid residue was in 
contact with any lipid molecule in snapshot f, and 0 for 
no contact. For example, within a trajectory of 1,000 
snapshots, if ARG90 is observed to be interacting with 
lipids in 300 snapshots, then bPfARGM) is 0.3. The total 
number of lipid binding amino acid residues of type i (i. 
e., Ni h in Eq. 1) can be then obtained by summing up 
these quantities for all the ARG residues. 

Lipophilicity scales of amino acids 

Comparisons were made between the lipid propensity 
scores of residues derived from the TM and MD data- 
sets and the thermodynamic free energy of transferring 
amino acid residues from water to the interface of 
POPC bilayer and to octanol. The latter (called the lipo- 
philicity scales in this paper) was taken from the data 
provided in White and Wimley's paper [37]. For the 
lipophilicity scales, we kept the protonation states of 
ARG and LYS positive, ASP and GLU negative and HIS 
neutral. 
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Correlation between propensity values of two datasets 

Comparisons between residue preferences were made 
using scatterplots and Pearson's correlation coefficient 
defined as 



C- 



nZXiYi-^XiZYi 



(5) 



where X t and Y, represent propensity (or lipophilicity) 
values of residue type i in two datasets being compared. 

The jackknife estimate of the standard error of the 
correlation coefficient was obtained as: 



• 2 - N+1 £(Cc-< 



N 



<c>y 
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where C(.q is the correlation coefficient calculated 
from data with the jth amino acid type removed and 
<C> is the mean of N (= 20) such values. The square 
root of the quantity in Eq. 6 was shown as the estimated 
standard error. 

Results 

Amino acid propensities from the crystal structure and 
MD datasets 

Amino acid propensities of membrane proteins contact- 
ing with lipid head and tail groups were derived from 
both crystal structures and MD simulations. Figure 1 
shows scatterplots between the propensities from the 
crystal structure and MD datasets. The correlation coef- 
ficients between these two were 0.81 and 0.95 for the 
lipid head and tail group contacts, respectively (see also 
Tables 3 and 4). Although good agreements were 
observed in both the lipid head and tail group contacts, 
some points in the plot for the head group contacts do 
not lie close to a straight line (Figure la), especially 
when compared with the plot for the tail group contacts 
(Figure lb). When the outliers in the head group plot 
(TRP, ARG, LYS) were removed, the correlation coeffi- 
cient rose to 0.88, a value close to that of the tail group 
without TRP (0.90) (see also Additional file 1, Fig. SI). 

The contact preferences for lipid head groups had lar- 
ger variance among individual proteins than for tail 
groups (see Table 3 and the Discussion section below). 
Thus, two of the outliers, LYS and ARG, may be due to 
the small number of proteins in the MD dataset; ttSe- 
cYE had more ARG residues than the average in the 
crystal structure dataset [16], while mjSecYEp had more 
LYS residues than the average. All these residues clus- 
tered in the membrane interfaces, especially on the cyto- 
plasmic side. Such a bias would have resulted in the 
higher head propensities of LYS and ARG in the MD 
dataset, although further analysis is needed to confirm 
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Figure 1 Scatterplots of amino acid propensities for interacting 
with lipids derived from crystal structures and MD data For (a) 

lipid head and (b) tail groups. The correlation coefficients were 0.81 
and 0.95 for the head and tail groups, respectively. 



this notion. Particularly high propensities of TRP were 
observed in both scatterplots, suggesting that TRP resi- 
dues are more frequently located in the regions that 
allow direct contacts with lipid molecules than in other 
regions (see Discussion below). 

Specific observations for each amino acid residue 

Here, we describe the lipid head and tail group prefer- 
ences of each amino acid residue observed in both the 
crystal structure and MD datasets (Table 3). 

Only TRP and TYR were favored by both the lipid 
head and tail groups. These residues, with their 
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Table 3 Amino acid propensities for interacting with lipids from crystal structures and MD trajectories, and 
lipophilicity scales 







Propensity from crystal structures 3 




Propensity from MD trajectories 


Lipophilicity 


scale b 


Head group 


Tail group 


Head group 


Tail group 


POPC 


octanol 


TRP 


1 A 1 
Z.' I 


\ 1 -U4J 


3 ik 


ir\ £31 


5.44 


4.38 


-1.85 


-2.09 


PHE 


1 ^ 
.J J 




yo 


ir\ a 31 


0.97 


1.97 


-1.13 


-1.71 


TYR 


T 17 
Z. 1 / 


(VJ/) 


1 1 ^ 


{U.zoj 


2.1 2 


1 .45 


-0.94 


-0.71 


LEU 


u. / J 


(ri 1 7} 


1 


[v.Zi) 


0.80 


1.70 


-0.56 


-1.25 


ILE 


U/l / 




1 /ia 
/to 


[V.Z/I 


0.48 


1 .-1 1 


-0.31 


-1.12 


CYS 




[IJ.3Z! 


1 1 £ 


(u.yj) 


0.06 


1.68 


-0.24 


-0.02 


MET 


u.yo 


lc\ 7 0 


/ 17 


/n ^ q\ 
[U.50) 


1 .0 1 


1.56 


-0.23 


-0.67 


GLY 


n C3 


in 7nl 
(0.20) 


U.47 


IPi 1 31 

(0. 1 3J 


0.42 


0.42 


0.01 


1.15 


VAL 


0.60 


(0.18) 


1 .2'1 


(0.41) 


0.40 


1.23 


0.07 


-0.46 


SER 


0.68 


(0.20) 


0.89 


(0.25) 


U./ J 


n aa 


0.13 


0.46 


THR 


0.82 


(0.25) 


0.76 


(0.15) 


0.43 


0.35 


0.14 


0.25 


ALA 


0.54 


(0.15) 


0.92 


(0.19) 


0.22 


0.66 


0.17 


0.50 


HIS 


1.99 


(0.39) 


0.53 


(0.09) 


1.35 


0.88 


0.17 


0.11 


ASN 


1.92 


(0.47) 


0.43 


(0.13) 


1.57 


0.31 


0.42 


0.85 


PRO 


0.56 


(0.37) 


0.35 


(0.12) 


0.77 


0.65 


0.45 


0.14 


GLN 


1.69 


(0.98) 


0.51 


(0.26) 


1.26 


0.25 


0.58 


0.77 


ARG 


2.42 


(0.65) 


0.27 


(0.21) 


3.85 


0.43 


0.81 


1.81 


LYS 


1.64 


(0.58) 


0.23 


(0.09) 


3.26 


0.55 


0.99 


2.80 


ASP 


0.51 


(0.32) 


0.06 


(0.04) 


0.61 


0.00 


1.23 


3.64 


GLU 


0.56 


(0.57) 


0.30 


(0.13) 


0.93 


0.12 


2.02 


3.63 



The amino acids are sorted in the ascending order of the lipophilicity scale for POPC interface. 

a Values in parentheses represent the estimated standard error of correlation. The average of the standard error is 0.41 for lipid head and 0.27 for tail groups. 
b The oxidation state of HIS has been taken as neutral. All ARG and LYS are taken as positively and all ASP and GLU are taken as negatively charged. 



amphiphilic nature, play a special role in the membrane- 
water interfaces. The small residues (GLY, SER, THR, 
ALA, PRO) were excluded from both lipid head and tail 
groups. Our previous study showed the propensities of 
the small residues on the protein surface in the TM 
region and around the membrane interfaces to be low, 
while those in the buried positions to be high [16]. 
These residues are thought to stabilize inter-helical con- 
tacts through non-conventional hydrogen bonds (Ca- 
H...O) [16,38]. The acidic residues (ASP, GLU), but not 



the basic ones (HIS, ARG, LYS), were also excluded 
from both lipid head and tail groups, consistent with the 
basic residues to occur favorably on the surface of the 
intracellular interface [16] (the positive-inside rule [39]). 

For lipid head group contacts, hydrophilic residues, 
both basic (HIS, ARG, LYS) and uncharged polar (ASN, 
GLN), were favored, except for small (SER and THR) 
and acidic (ASP, GLU). TRP and TYR were the only 
hydrophobic residues favored by lipid head groups. For 
lipid tail group contacts, no hydrophilic residues were 



Table 4 Three-way relationships between the amino acid propensities for interacting with lipids from crystal 
structures and MD trajectories, and lipophilicity scales 



Propensity from crystal 
structures 



Propensity from MD 
trajectories 



Lipophilicity scale' 



Headgroup 



Tailgroup 



Headgroup 



Tailgroup 



POPC 



octanol 



Propensity from crystal structures Headgroup 

Tailgroup 

Propensity from MD trajectories Headgroup 

Tailgroup 

Lipophilicity scale 3 POPC 

octanol 



1 .00 (0.00) 



0.1 9 (0.38) 
1.00 (0.00) 



0.81 (0.06) 
0.33 (0.69) 
1 .00 (0.00) 



0.32 (0.36) 
0.95 (0.05) 
0.49 (0.66) 
1 .00 (0.00) 



-0.28 (0.27) 
-0.87 (0.07) 
-0.24 (0.49) 
-0.84 (0.05) 
1 .00 (0.00) 



-0.16 (0.25) 
-0.82 (0.05) 
-0.06 (0.40) 
-0.75 (0.07) 
0.92 (0.03) 
1 .00 (0.00) 



All-against-all correlation coefficients between the properties presented in Table 3. Values in parentheses represent standard error in correlation (see Methods). 
a The oxidation state of HIS has been taken as neutral. All ARG and LYS are taken as positively and all ASP and GLU are taken as negatively charged. 
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Figure 2 Scatterplots of amino acid propensities for interacting 
with lipid tail groups versus the POPC lipophilicity scale 

Propensities for lipid tail groups derived from (a) crystal structures 
and (b) MD data. The correlation coefficients were -0.87 and -0.84 
for the propensities from crystal structures and MD data, 
respectively. 



favored and all the hydrophobic residues (TRP, PHE, 
TYR, LEU, ILE, CYS, MET, VAL) were favored, except 
for small ones (PRO, ALA, GLY). 

Comparison with the lipophilicity scales 

We then compared the amino acid propensities with the 
experimentally determined lipophilicity scales, which 
were derived from transfer free energies of model pep- 
tides from water to POPC membrane interface and to 
bulk octanol [37]. (The correlation coefficients were cal- 
culated by using the raw values of the amino acid 



propensities and the lipophilicity scales, as described in 
Methods.) The amino acid propensities and the lipophi- 
licity scales are summarized in Table 3, and a compre- 
hensive list of correlation coefficients between the three 
sets of values is shown in Table 4. 

The propensities for the tail group atomic contacts, 
derived from both the crystal structure and MD data- 
sets, were highly correlated with the lipophilicity scales 
(with the correlation ranging from 0.75 to 0.87, Figure 
2). However, the propensities for the head group atomic 
contacts were poorly correlated with the lipophilicity 
scales (with the correlation ranging from 0.06 to 0.28). 
This observation suggests that the lipid tail group pro- 
pensities can be largely described by the free energy of 
transfer of model peptides. 

Comparison with non-TM data 

Amino acid propensities for contacting with lipids were 
derived also from a set of non-TM proteins and com- 
pared with those derived from the TM dataset. A sum- 
mary of the Chi-square statistics for lipid contacts of all 
20 amino acid residues in the TM and non-TM proteins 
is presented in Table 5. 

Despite some small differences in the degree of prefer- 
ence (e.g., ASN contacts with lipid head groups being 
statistically significant only in the TM dataset), no 
amino acids were exclusively preferred in either dataset. 
Out of the 40 comparisons in Table 5 (for 20 amino 
acids in each type of contacts), only two occurrences 
were found such that the number of observed contacts 
was higher than expected in TM and lower than 
expected in non-TM or vice versa (GLY for the head 
group contacts and CYS for the tail group contacts). 

To summarize, we found that an almost identical set 
of amino acids were used to form lipid contacts in the 
TM and non-TM proteins, with only small differences 
in the statistical significance of over- or under- 
representation. 

Discussion 

We showed that the patterns of membrane protein-lipid 
interactions obtained from both the crystal structures 
and MD trajectories were highly correlated with each 
other (Figure 1). We also showed that the recognition of 
lipid tail groups by amino acid residues can be described 
by the lipophilicity scales (Table 4) and had the same 
tendency with non-TM proteins (Table 5), while lipid 
head groups demonstrated considerable variation among 
individual proteins. We discuss here how our observa- 
tions can be associated with existing experimental data 
and previously proposed concepts concerning protein- 
lipid interactions. We also elaborate on the high propen- 
sities of TRP residue for the membrane protein-lipid 
interface. 



Morita et al. BMC Biophysics 201 1, 4:21 
http://www.biomedcentral.eom/2046-1682/4/21 



Page 9 of 12 



Table 5 Lipid contact statistics in TM and non-TM proteins with (a) head group and (b) tail group atoms 

Transmembrane (TM) proteins Non-transmembrane (non-TM) proteins 



Obs a Exp a Counts Signed chi-square P-value Obs a Exp a Counts Signed chi-square P-value 



(a) Head group 




















TRP f ' * 


22 


9.12 


290 


18.19 


2.00E-05 


3 


2.23 


84 


0.27 


6.06E-01 


PHE + ' * 


35 


22.65 


720 


6.74 


9.43E-03 


13 


9.06 


34 1 


1 .72 


1 .90E-01 


TYR + ' + 


28 


12.90 


410 


17.69 


2.60E-05 


21 


7.25 


273 


26.08 


3.27E-07 


LEU 


28 


3740 


1 1 89 


-2.36 


1 .24E-01 


20 


21 .85 


823 


-0.16 


6.92E-01 


LE 


1 1 


23.18 


737 


-6.40 


1.14E-02 


9 


1 1.58 


436 


-0.57 


4.49E-01 


CYS 


2 


4.25 


135 


-1.19 


2.76E-01 


2 


A .5 1 


170 


-1 .40 


2.37E-01 


MET 


1 1 


1 1 .51 


366 


-0.02 


8.80E-01 


1 


4.22 


159 


-2.46 


1 .1 7E-01 


GLY* 


15 


28.06 


892 


-6.07 


1.37E-02 


15 


1 3.54 


5 1 0 


0.16 


6.92E-01 


VAL 


16 


26.70 


849 


-4.29 


3.83E-02 


1 4 


1 5.1 1 


569 


-0.08 


7.75E-01 


SER 


1 4 


20.57 


654 


-2.1 0 


1 .47E-01 


13 


1 5.61 


588 


-0.44 


5.08L-01 


THR 


16 


1 9.53 


621 


-0.64 


4.24E-01 


5 


1 0.62 


4 00 


-2.98 


8.45 E-02 


ALA 


16 


29.85 


949 


-6.42 


1.13E-02 


1 2 


1 7.55 


661 


-1 .76 


1 .85 E-01 


i nrt t 
HIS ' 


16 


8.05 


256 


7.85 


5.09E-03 


7 


5.47 


206 


0.43 


5.1 3E-01 


ASN + ' + 


22 


1 1.45 


364 


9.73 


1.82E-03 


I I 


9.00 


339 


0.44 


5.06E-01 


PRO 


9 


1 5.98 


508 


-3.05 


8.09E-02 


6 


1 0.36 


390 


-1 .83 


1 .76E-01 


i~\ kit % 

GLN 


14 


8.30 


264 


3.91 


4.80E-02 


12 


9.77 


368 


0.51 


4.76E-01 


ARG +1 * 


32 


13.24 


421 


26.58 


2.53E-07 


20 


10.41 


392 


8.84 


2.95 E-03 


wrt t 

LYS 


20 


12.20 


388 


4.98 


2.56E-02 


20 


1 3.01 


490 


3.75 


5.27E-02 


ASP 


6 


1 1 .73 


373 


-2.80 


9.43 E-02 


5 


1 1.79 


444 


-3.91 


4.80E-02 


GLU 


8 


1 4 .34 


456 


-2.80 


9.40E-02 


7 


1 3.04 


491 


-2.80 


9.45 E-02 


(b) Tail 


group 




















TRP + ' * 


42 


12.9 


290 


65.46 


5.93E-16 


9 


3.07 


84 


1 1 .48 


7.05 E-04 


PHE + ' * 


63 


32.1 


720 


29.82 


4.75E-08 


35 


12.45 


341 


40.84 


1.66E-10 


TYR + ' * 


21 


1 8.3 


4 1 0 


0.4 1 


5.22E-01 


20 


9.97 


273 


10.10 


1.49E-03 


in it t 

LEU ' 


85 


53.0 


1 189 


19.37 


1.08E-05 


68 


30.05 


823 


47.92 


4.43 E-1 2 


LE ' 


48 


32.8 


737 


7.01 


8.12E-03 


33 


1 5.92 


436 


18.33 


1.86E-05 


CYS + 


7 


6.0 


1 35 


0.1 6 


6.88E-01 


5 


6.21 


1 70 


-0.23 


6.28E-01 


MET + ' * 


24 


1 6.3 


366 


3.63 


5.67E-02 


1 3 


5.81 


159 


8.92 


2.83 E-03 


GLY 


17 


39.7 


892 


-13.01 


3.10E-04 


4 


18.62 


510 


-1 1 .48 


7.03 E-04 


VAL f ' :t 


47 


37.8 


849 


2.23 


1.36E-01 


34 


20.78 


569 


8.42 


3.72E-03 


SER 


26 


29.1 


654 


-0.34 


5.61 E-01 


8 


21.47 


588 


-8.45 


3.65 E-03 


THR 


21 


27.7 


621 


-1.61 


2.05 E-01 


7 


14.61 


400 


-3.96 


4.66E-02 


ALA 


39 


42.3 


949 


-0.25 


6.14E-01 


24 


24.14 


661 


0.00 


9.78E-01 


HIS 


6 


11.4 


256 


-2.56 


1.10E-01 


3 


7.52 


206 


-2.72 


9.92E-02 


ASN 


7 


16.2 


364 


-5.24 


2.21 E-02 


3 


12.38 


339 


-7.11 


7.69E-03 


PRO 


8 


22.6 


508 


-9.46 


2.10E-03 


9 


14.24 


390 


-1.93 


1.65 E-01 


GLN 


6 


11.8 


264 


-2.82 


9.30E-02 


2 


13.44 


368 


-9.73 


1.81 E-03 


ARG 


5 


18.8 


421 


-10.09 


1.49E-03 


7 


14.31 


392 


-3.74 


5.32E-02 


LYS 


4 


17.3 


388 


-10.21 


1.40E-03 


8 


17.89 


490 


-5.47 


1.94E-02 


ASP 


1 


16.6 


373 


-14.68 


1.28E-04 


2 


16.21 


444 


-12.46 


4.16E-04 


GLU 


6 


20.3 


456 


-10.09 


1.49E-03 


3 


17.93 


491 


-12.43 


4.22E-04 



Statistically significant values (p-value <0.05 or 95% significance) are in bold font. The amino acids are sorted in the ascending order of the lipophilicity scale for 
POPC interface. 

f ' * The dagger and double-dagger symbols are used to show residues in which observed contacts are more than expected for TM and non-TM proteins, 
respectively. 

a Obs and Exp stand for observed and expected number of counts, respectively. 
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Relation of Amino acid propensities to lipid-membrane 
protein interaction 

Since membrane proteins are generally crystallized with 
detergent molecules used for solubilization and purifica- 
tion, the lipid molecules that remain in the crystal are 
considered those that are tightly bound to the mem- 
brane proteins. On the other hand, the lipid molecules 
in the first shell, also known as the annular shell around 
a membrane protein, are in direct contact with the pro- 
tein and form weak and non-specific interactions 
according to spin-label EPR and fluorescence quenching 
experiments [40,41]. Thus, intuitively, the amino acid 
propensities from the crystal structures should corre- 
spond to propensities for interacting with tightly-bound 
lipid molecules, while those from the MD trajectories 
should correspond to propensities for interacting weakly 
with lipid molecules in the annular shell (although some 
of these lipid molecules can be tightly bound). It is, 
therefore, non-trivial that we have observed such a high 
level of correlation between the propensities derived 
from these two datasets (Figure 1). Assuming that the 
tight binding of lipids is achieved by forming a special 
binding pocket on the surface of a protein, the amino 
acid composition of such binding pockets appears to be 
no different from that of other surface positions. This 
result implies that no special chemical interaction is 
required for achieving the tight binding of at least the 
tail portion of lipid molecules, but transmembrane helix 
packing may create a specific binding pocket for specific 
lipid types for the protein's function. 

Experimental studies of the potassium channel KcsA 
[4,42] suggest that the tightly-bound lipids can be essen- 
tial for its stability and function. The amino acid resi- 
dues that interact with these tightly-bound lipids must 
have been selected during the course of evolution. How- 
ever, our results suggest that these amino acids have 
been selected not necessarily based on their ability to 
form special chemical interactions with lipid tails but 
rather, they are general lipid-binding surface amino 
acids and happened to have been utilized for offering a 
physical basis of strong interaction. 

For the head group contacts, although the TM and 
non-TM datasets produced a similar trend (Table 5), a 
weaker correlation was observed between the propensi- 
ties derived from the crystal structure and MD datasets 
than that for the tail group contacts (Figure 1). The dif- 
ference between the head and tail contacts may be attri- 
butable to the larger standard error for the propensities 
for the head contacts (Table 3). The propensity values 
were computed for each protein and then averaged and 
thus, the larger standard error indicates a larger variance 
among the propensity values derived from different pro- 
teins. Indeed, a variety of modes of interaction have 
been observed between the protein and lipid head 



groups in our dataset. Head groups of lipids often show 
disorder in high-resolution X-ray structures even when 
their tail groups are observed [40,43]. In our dataset, the 
head groups of tightly-bound lipids were completely or 
mostly disordered in rhodopsin (lgzm A), sensory rho- 
dopsin (lxio_A), succinateiubiquinone oxidoreductase 
SQR (2h89_C) and halorhodopsin (3a7k_A); and fully or 
partially observed but not forming any hydrogen bond 
in bacteriorhodopsin (lxOi_l), SQR (lzoy_D), V-Type 
Na + -ATPase (2bl2_I) and ligand-gated ion channel 
GLIC (3eam_C). In other cases, the head groups 
appeared and formed hydrogen bonds, while the tail 
groups were disordered in Ca + -ATPase (2eau_A), 
rhomboid protease GlpG (2irv_B), potassium channel 
Kir (2wll_D) and nitrate reductase A NarGHI (3egw_C). 

Experimental studies have shown that differences in 
the chemical composition of the lipid head group affect 
the stability and function of membrane proteins, includ- 
ing KcsA, MscL, Ca 2+ -ATPase and others. Considering 
all these observations, the role of lipid head-protein 
interactions is likely to vary among different types of 
membrane proteins and this notion is consistent with 
the head contact propensities obtained in this paper, 
which were diverse and more complex than the tail con- 
tact propensities. 

Concentration of TRP at a lipid-water interface for 
anchoring the protein to the membrane 

In both the crystal structure and MD datasets, we 
observed a conspicuously high propensity of TRP resi- 
dues for contacting lipid molecules (Figure 1), indicating 
that TRP favors positions in a membrane protein that 
allow interaction with lipids. 

Although TRP is generally not an abundant residue, 
either in membrane or soluble proteins [16], TRP has 
been reported to occur frequently near the membrane 
boundaries [44-46], as confirmed by our recent statisti- 
cal analysis [16]. Systematic experimental studies using 
model peptides and proteins have also produced a simi- 
lar picture [47-50]. (See Killian and von Hejine [51] for 
a review and examples of high-resolution structures are 
found in Lee [40].) 

The amphiphilic nature of TRP (and also TYR) resi- 
dues explains why TRP favors to locate at a water-lipid 
interface; these amphiphilic residues are thought to be 
locking the membrane protein into the correct location 
and orientation like anchors or floats at the membrane- 
water interface. Sansom and colleagues have observed 
the interfacial anchoring behavior of the amphiphilic 
residues in their MD simulations of both the outer 
membrane protein OmpA and the potassium channel 
KcsA [18]. 

All indications are that the significantly high propensi- 
ties in Figure 1 were obtained as a consequence of the 
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combined effect of the general low abundance and the 
amphiphilic nature of TRP. 

Conclusions 

We analyzed lipid preferences of membrane proteins at 
atomic resolution, which were divided into those for 
lipid head and tail groups, by using a combination of 
data from crystal structures and MD simulations. The 
results revealed a common pattern of lipid tail-amino 
acid interactions in both datasets, suggesting that 
tightly-bound lipid molecules and lipids in the annular 
shell interact with membrane proteins in a similar man- 
ner, largely explained by general lipophilicity. On the 
other hand, lipid head-amino acid interactions showed a 
more complicated and variable pattern and are likely to 
affect the specific function of individual proteins. We 
also showed that TM and non-TM proteins utilize 
essentially an identical set of amino acids for interacting 
with lipid head and tail groups. 
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